In today’s digital age, fraud has become a pervasive risk, infiltrating various aspects of daily life. The fraud landscape is vast and growing, so understanding the extent and impact of this fraudulent activity on individuals and the wider economy is important. This Notebook explores the findings of a study by Global Fraud Alliance (GASA) and Feedzai1.
Through this research, we aim to:
Fraud and methods of dissemination
A growing body of literature highlights the alarming prevalence of fraud in Canada. According to a report by the Canadian Anti-Fraud Center (CAFC), Canadians reported more than 68,000 cases of fraud in 2023 alone with financial losses in excess of 531 Million CAD (Global Anti-Scam Alliance, 2023).
Mental and Emotional Influences
The psychological and emotional toll of cheating is significant and often overlooked. Research has shown that victims of fraud experience a range of negative emotions including shame, guilt and anxiety by (Button et al., 2014). Survey found that 47% of fraud victims in Canada reported severe emotional impact, and that these non-financial behaviors need to be addressed with support systems (Feedzai and Global Fraud Alliance, 2024).
Economic Impact
The economic losses caused by the fraud are enormous. An analysis estimated an average loss of 2,406 CAD per person, representing a loss of 13 Billion CAD per year or 0.5% of Canada’s GDP (Feedzai and Global Fraud Alliance, 2024). The analysis, alongside various studies (Cross, 2019) show the enormous economic burden of personal and economic fraud.
Reporting & Regulation
Despite the high level of fraud, the number of reports remains low. Some examinations found that 69% of fraud victims do not report their experiences to law enforcement, often due to uncertainty of where to report, complex procedures and fear of repercussions. This underreporting complicates efforts to quantify and combat fraud (Feedzai and Global Fraud Alliance, 2024). A Study emphasizes the importance of systematic reporting (Parti & Tahir, 2023).
Dataset has been sourced from the Canadian Anti-Fraud Center’s Fraud Reporting System database2. Data was gathered by Canadian Anti-Fraud Center’s through their Fraud Reporting System and primarily from public submission. The accuracy of the data depends on various factors i.e. completeness, reliability of the information provided by the user while submitting the information.
Here are the definition and description of abbreviation used in the dataset
| Variables | Definition |
|---|---|
| Date_Received | When the complaint was reported |
| Complaint_Received_Type | A report received through the CAFC’s Online Reporting System |
| Province_State | The province or territory where the scam occurred |
| Fraud_and_Cybercrime_Thematic_Categories | The type of fraud reported by the victim (selected from a drop-down list or described during a telephone report to a CAFC intake analyst) |
| Solicitation_Method | The initial method of contact between the fraudster and victim |
| Language_of_Correspondence | The language used during the fraud |
| Complaint_Type | Type of complaint received by CAFC (e.g., Email, Phone) |
| Number_of_Victims | Total number of victims associated with the reported instance(s) of fraud |
| Dollar_Loss | Total amount of money lost due to the instance(s) of fraud |
To prepare data for for analysis, several measures were undertaken. Following are the step had been taken for data cleansing:
Here is the list of all columns including expected values format
## $Date_Received
## [1] "^[0-9]{4}-[0-9]{2}-[0-9]{2}$"
##
## $Complaint_Received_Type
## [1] "CAFC Website" "NCFRS" "Email" "Phone" "Mail"
## [6] "In Person" "Intercepted" "Message"
##
## $Province_State
## [1] "Saskatchewan" "Quebec"
## [3] "Ontario" "British Columbia"
## [5] "Yukon" "Alberta"
## [7] "Manitoba" "Prince Edward Island"
## [9] "Newfoundland And Labrador" "Nova Scotia"
## [11] "New Brunswick" "North West Territories"
## [13] "Nunavut"
##
## $Fraud_and_Cybercrime_Thematic_Categories
## [1] "Merchandise"
## [2] "Identity Fraud"
## [3] "Phishing"
## [4] "Vendor Fraud"
## [5] "Spear Phishing"
## [6] "Extortion"
## [7] "Emergency (Jail, Accident, Hospital, Help)"
## [8] "Job"
## [9] "Prize"
## [10] "Personal Info"
## [11] "Counterfeit Merchandise"
## [12] "Service"
## [13] "GRANT"
## [14] "Collection Agency"
## [15] "Bank Investigator"
## [16] "Investments"
## [17] "Romance"
## [18] "Spoofing"
## [19] "Loan"
## [20] "Unauthorized Charge"
## [21] "Charity / Donation"
## [22] "Foreign Money Offer"
## [23] "Office Supplies"
## [24] "Health"
## [25] "Timeshare"
## [26] "Psychics"
## [27] "False Billing"
## [28] "Vacation"
## [29] "Recovery Pitch"
## [30] "Survey"
## [31] "Fraudulent Cheque"
## [32] "Directory"
## [33] "Pyramid"
## [34] "Modem-Hijacking"
## [35] "Telecom Fraud"
## [36] "Credit Card"
##
## $Solicitation_Method
## [1] "Email" "Text message"
## [3] "Direct call" "Internet-social network"
## [5] "Internet" "Mail"
## [7] "Door to door/in person" "Print"
## [9] "Video Call" "Television"
## [11] "Radio"
##
## $Gender
## [1] "Male" "Female"
##
## $Language_of_Correspondence
## [1] "English" "French"
##
## $Victim_Age_Range
## [1] "'1 - 9" "'10 - 19" "'20 - 29"
## [4] "'30 - 39" "'40 - 49" "'50 - 59"
## [7] "'60 - 69" "'70 - 79" "'80 - 89"
## [10] "'90 - 99" "'100 +" "'Business / Entreprise"
##
## $Complaint_Type
## [1] "Attempt" "Victim"
##
## $Dollar_Loss
## [1] "^\\$\\d+\\.\\d{2}$"
##
## $Country
## [1] "Canada"
##
## $Number_of_Victims
## [1] 0 1
Finally, here are our final dataset with all expected values and without any missing data. In the final dataset we have 84,427 number of rows and 12 number of columns
It is noticeable that there are numerous fields in which there are no values. This is illustrated in the following table:
Figure 1: Gender distribution of affected individuals
Figure 2: Relationship between Gender & Most Occurred Scams/Fraud by Categories
Figure 3: Relationship between Gender & Least Occurred Scams/Fraud by Categories
| Variables | Values |
|---|---|
| Degrees of freedom (df) | 1 |
| X-squared | 2.19e+01 |
| p-value | 2.88e-06 |
Figure 4: Regression Analysis for Male
Figure 5: Regression Analysis for Female
Figure 6: Victim by Age and Complaint Type
Figure 7: Monetary Loss by Age and Gender
| Variable | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| Victim_Age_Range | 10 | 1.22e+07 | 1.22e+06 | 1.06e+01 | 1.83e-15 |
| Fraud_Type | 32 | 4.62e+07 | 1.45e+06 | 1.25e+01 | 1.46e-39 |
| Residuals | 320 | 3.7e+07 | 1.16e+05 |
Figure 8: Analyzing the Relationship between Age, Gender, and Dollar Loss
The analysis indicates that both age and gender significantly influence financial losses due to fraud. Furthermore, the interaction between age and gender also plays a significant role. These findings highlight the need for targeted strategies to address vulnerabilities specific to different age and gender groups in combating fraud and cybercrime.
Figure 9: Fraud Incidents by Provinces in Canada
Figure 10: Fraud distribution by Province and Category
Intercept: The log odds of reporting a scam when all predictors are at the reference level.
Victim_Age_Range: A positive coefficient (0.256) indicates that with an increase in age range, the log odds of reporting a scam increase.
Gender Male: A positive coefficient (0.398) indicates that males are more likely to report a scam compared to the reference gender.
Type Of Fraud Y: A positive coefficient (0.562) suggests that this type of fraud is more likely to be reported.
Province State Z: A negative coefficient (-0.173) indicates that residents of this province/state are less likely to report scams compared to the reference province/state. Significance:
The p-values for all predictors are less than 0.05, indicating that they significantly affect the likelihood of reporting scams.
The logistic regression analysis shows that age, gender, type of fraud, and province/state significantly influence the likelihood of reporting scams to law enforcement. This information can help in understanding the reporting behaviour and designing targeted interventions to encourage scam reporting among different demographic groups and regions.
| Variables | Df | Sum Sq | Mean Sq | F value | Pr(>F) |
|---|---|---|---|---|---|
| Province_State | 1.2e+01 | 1.5e+02 | 1.3e+01 | 5.1e+01 | 3.4e-123 |
| Residuals | 8.4e+04 | 2.1e+04 | 2.5e-01 |
Figure 11: Relationship between Number of Victim and Outcome of Fraud
Figure 12: Individual Scams Over Time
Figure 13: Individual Scams Over Time
Figure 14: Months when no Incident Reported
Figure 15: Distribution of Complaint by Month
The comprehensive analysis of fraud and cybercrime complaints in Canada from 2021 to 2024 reveals significant insights into the demographic and regional patterns of victimization and reporting behaviours. Through a series of statistical tests, including ANOVA and logistic regression, we have identified critical factors that influence financial losses and the likelihood of reporting fraud.
This study provides a detailed analysis of the demographic and regional patterns in fraud and cybercrime complaints in Canada, highlighting significant factors that influence financial losses and reporting behaviours. The findings underscore the importance of targeted awareness campaigns, regional strategies, and enhanced reporting mechanisms to combat fraud effectively. Future research should focus on understanding the behavioural aspects of reporting and the regional factors contributing to fraud patterns to inform more effective policy measures.